See copyright notice at the bottom of this page.
List of All Posters
How are Runs Really Created
August 12, 2002 - Mike Tamada
(e-mail)
Very nice article. One thing that I wonder about though: sure, the negative value of an out is less in a Pedro Martinez environment than it is in Chan Ho Park environment (talk about Park effects!). As measured in terms of run expectancy.
But what about *win* expectancy?
That is, the formulas all seem to be geared to calculating what happens to a team's expected number of runs. But the VALUE of those runs will presumably be different in the different environments, just as the value of an out or a single is different in the different environments.
Possible example: the value of a home run. In terms of expected runs added, it might be higher in a Chan Ho Park environment than in a Pedro Martinez environment -- there's likely to be more players on base when the HR is hit. But in terms of expected wins added, I'd guess that the HR against Pedro is exceptionally valuable, whereas the HR against Chan Ho is, well, more Ho-hum.
--MKT
How are Runs Really Created
August 16, 2002 - Mike Tamada
Walt Davis wrote an extremely good explanation of linear regression models, including this passage:
* * * In regression, a coefficient gives you the impact of adding that particularly variable to the model, after having removed all the influence of the other variables from both the dependent variable and the independent variable in question (aka "statistical control"). I * * *
To which Tangotiger replied:
* * * The problem is that if you freeze say all the hits, HR, etc, but leave the walk to be the independent variable in question, its value is dependent on the values of hits and HR. So, you freeze hits and HR at say 10 and 2, then the value of 1 walk might be .30, the value of 2 walks might average .32, the value of 3 walks might average .34. Furthermore, if you then freeze the hits at 11 and the HR at 1, all these values change. So, exactly what is the value of the walk? * * *
That is no problem either, for problems which are "linear" (using the broad definition of "linear in the parameters", as explained by Walt Davis). Tangotiger is describing an interactive situation, where the value of a walk VARIES, depending on whether hits/HR are 10/2, or 11/1, or whatever.
The solution (not that it will always work, but it frequently works) is already contained in Walt Davis' original posting, in his replies to Points 1 and 2, especially his Point 1 reply:
* * * There's no problem mixing linear and non-linear terms in the same equation (assuming you have the rationale to back it up):
y = b0 + b1*ln(X1) + b2*X2 + b3*ln(X1)*X2 + b4*X3 + b5*X3^2 + e * * *
That 4th term, ln(X1)*X2, is an example of looking at interactive terms: you simply multiply X1 times X2 (you can take logs if your model calls for it, or not take logs), and include that interactive term in the righthand side along with X1 and X2.
The resulting regression equation will then measure not just the effects of X1 and X2 (where X1 might be walks for example, and X2 home runs), but ALSO the effects of X1 and X2 for DIFFERENT LEVELS of X1 and X2 (the effects of walks, CORRECTED for the different home run environments).
The model is still linear and thus still has some limitations. Even including these interactive terms (and the 2d-degree polynomial terms, as described by Walt Davis) may still not yield the correct model. Walt Davis's original reply however pointed the way to solutions which might cover those situations, while still remaining within a linear model:
* * * Modeling that would suggest other possibilities like a series of dummy variables representing different run-scoring eras or a multi-level random effects model. * * *
It is possible that the different environments could become so different that separate equations or at least separate parameters might be required to adequately model the differences in those environments. And some situations are simply so non-linear that they require nonlinear models.
But Walt Davis's points and suggestions are very good ones. "Linear" models in fact have a lot of flexibility to cover seemingly non-linear situations.
--MKT
Copyright notice
Comments on this page were made by person(s) with the same handle, in various comments areas, following Tangotiger © material, on Baseball Primer. All content on this page remain the sole copyright of the author of those comments.
If you are the author, and you wish to have these comments removed from this site, please send me an email (tangotiger@yahoo.com), along with (1) the URL of this page, and (2) a statement that you are in fact the author of all comments on this page, and I will promptly remove them.